Research on Chinese discourse rhetorical structure representation scheme and corpus annotation

نویسنده

  • Guodong Zhou
چکیده

It is well-known that interpretation of a text requires understanding of its rhetorical relation hierarchy since discourse units rarely exist in isolation. Such discourse structure is fundamental to document-level applications, such as text understanding, summarization, knowledge extraction and question-answering. In comparison with English, there are only a few studies on Chinese discourse analysis, due to the lack of appropriate theories to Chinese discourse structure representation and large-scale well-accepted corpora. In this talk, I will present a novel discourse structure representation scheme for Chinese, called Connectivedriven Dependency Tree (CDT), and describe our adventure in corpus annotation of the Chinese Discourse Treebank (CDTB) of 500 documents, using a top-down strategy to keep consistent with Chinese native’s cognitive habit. BIO: Zhou Guodong received the Ph.D. degree in computer science from the National University of Singapore in 1999. He joined the Institute for Infocomm Research, Singapore, in 1999, and had been an associate scientist, scientist and associate lead scientist at the institute until August 2006. Currently, he is a distinguished professor at the School of Computer Science and Technology, Soochow University, Suzhou, China. His research interests include natural language processing, information extraction and machine learning. Currently, he is an associate editor of ACM Transaction on Asian Language Information Processing(2010.07-2016.06), an editorial member of Journal of Software (Chinese)(2012.012014.12) and a vice chair of Technical Committees on Chinese Information/China Computer Federation(2010.12-2016.12), Computational Linguistics/Chinese Information Processing Society of China and Natural Language Understanding/Artificial Intelligence Society of China. Besides, he had been a member of the Editorial Board of Computational Linguistics (2010.01-2012.12).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building Chinese Discourse Corpus with Connective-driven Dependency Tree Structure

In this paper, we propose a Connectivedriven Dependency Tree (CDT) scheme to represent the discourse rhetorical structure in Chinese language, with elementary discourse units as leaf nodes and connectives as non-leaf nodes, largely motivated by the Penn Discourse Treebank and the Rhetorical Structure Theory. In particular, connectives are employed to directly represent the hierarchy of the tree...

متن کامل

The Third CIPS - SIGHAN Joint Conference on Chinese Language Processing

It is well-known that interpretation of a text requires understanding of its rhetorical relation hierarchy since discourse units rarely exist in isolation. Such discourse structure is fundamental to document-level applications, such as text understanding, summarization, knowledge extraction and question-answering. In comparison with English, there are only a few studies on Chinese discourse ana...

متن کامل

Parallel Discourse Annotations on a Corpus of Short Texts

We present the first corpus of texts annotated with two alternative approaches to discourse structure, Rhetorical Structure Theory (Mann and Thompson, 1988) and Segmented Discourse Representation Theory (Asher and Lascarides, 2003). 112 short argumentative texts have been analyzed according to these two theories. Furthermore, in previous work, the same texts have already been annotated for thei...

متن کامل

Discursive Usage of Six Chinese Punctuation Marks

Both rhetorical structure and punctuation have been helpful in discourse processing. Based on a corpus annotation project, this paper reports the discursive usage of 6 Chinese punctuation marks in news commentary texts: Colon, Dash, Ellipsis, Exclamation Mark, Question Mark, and Semicolon. The rhetorical patterns of these marks are compared against patterns around cue phrases in general. Result...

متن کامل

Annotation upon Annotation: Adding Signalling Information to a Corpus of Discourse Relations

We present an annotation effort that involves adding a new layer of annotation to an existing corpus. We are interested in how rhetorical relations are signalled in discourse, and thus begin with a corpus already annotated for rhetorical relations, to which we add signalling information. We show that a very large number of relations carry signals that can help identify them as such. The detaile...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014